Many applications today require support for more than
one language. Even if you're writing an application that sees use only
in a single country, the chances are high that people who speak other
languages will use it and will require the prompts and other application
features in another language. The following sections describe some of
the principles of using the new Extended Linguistic Services (ELS) in
Windows 7.
1. Understanding the Role of Extended Linguistic Services
Over the years, developers
have relied on a host of techniques for adding language support to their
applications. For example, developers have used Local Identifiers
(LCIDs) for many years. You can see a list of LCIDs at http://krafft.com/scripts/deluxe-calendar/lcid_chart.htm.
Typically, the developer places the resources for a particular language
in a subdirectory that uses the number of the language it supports,
such as 1033 for American English or 2058 for Mexican Spanish. Of
course, to get full LCID support, you also need to know that the
hexadecimal equivalent for 1033 is 409 because Microsoft uses 409 for
American English. In addition, you need to know that American English
uses code page 1252. (Yes, it can get pretty confusing — see the charts
at http://www.science.co.il/language/locale-codes.asp and http://www.i18nguy.com/unicode/codepages.html.)
The LCIDs go only so far in
solving the problem of support for multiple languages, though, and they
don't necessarily provide an overall solution. In fact, the solution is a
hodgepodge because different developers will support different
languages. Microsoft saw a need to add operating-system-level support
for other languages and has done so using the National Language Support
(NLS) functionality that developers have used up till now (see http://msdn.microsoft.com/library/dd319078.aspx
for details). LCIDs are also abstract, so Microsoft has implemented
Internet Engineering Task Force (IETF)-conformant string-based
identifiers (see http://www.faqs.org/rfcs/rfc4646.html).
Even with NLS, though,
the developer still encounters problems because simply providing
prompts in the right language isn't enough. An application today needs
to perform a number of tasks correctly for each language it supports,
such as:
Presentation of numeric data
Presentation of dates and time
Data sorting
Keyboard layout support
Printing support
Translation
between data formats (not necessarily the language itself, but the data
presentation, such as the format of data or time)
Trying to code all these
required tasks by hand can be error-prone and time-consuming. ELS is a
new Windows 7 API that makes it easier to perform these tasks by
reducing the amount of work the developer needs to do to obtain the
desired result.
2. Configuring the Extended Linguistic Services Example
The Extended Linguistic
Services example starts with a Windows Forms application. To make it
easy to see how the language detection features of ELS work, the example
provides a textbox for entry and a list box for output. In some cases,
such as detecting Spanish or Portuguese, you'll actually see multiple
language possibilities as output, which makes the list box convenient.
There's also a test button that initiates the language detection check.
You'll use the Code Pack for this example and will need to add the
following reference:
Microsoft.WindowsAPICodePack.ExtendedLinguisticServices.DLL
The example also requires the following using statement:
using Microsoft.WindowsAPICodePack.ExtendedLinguisticServices;
3. Adding Extended Linguistic Services to an Application
There's one question that seems
to vex developers everywhere: how to detect the language the user is
speaking. It's a problem because the user often won't be able to
communicate the language easily, or your application may not support the
user's language directly. For that matter, the user may be another
computer and not a human user at all. This particular service, language
detection, is one that just about any developer can use. Listing 1 shows the code for this example.
Example 1. Detect the user's language
private void btnTest_Click(object sender, EventArgs e)
{
// Clear the previous language list.
lbOutput.Items.Clear();
// Create a mapping service.
MappingService LangDetect = new MappingService(
MappingAvailableServices.LanguageDetection);
// Perform language detection.
using (MappingPropertyBag Bag =
LangDetect.RecognizeText(txtInput.Text, null))
{
// Obtain the list of candidates as a string array.
String[] Languages =
Bag.GetResultRanges()[0].FormatData(
new StringArrayFormatter());
// Process each of the languages in the list.
foreach (String Language in Languages)
lbOutput.Items.Add(Language);
}
}
|
The example begins by clearing the list box, lbOutput, so that you see just the languages associated with the current string, which appears in txtInput.
The next step is to create a MappingService object, LangDetect. The MappingService() constructor requires an input telling it what type of service to create. The available services appear in the MappingAvailableServices enumeration. The example uses the LanguageDetection service. The MappingService also supports script detection and a number of transliteration services.
Detecting a language involves creating a map of the language properties. The code creates a MappingPropertyBag object, Bag, to hold these properties. The LangDetect.RecognizeText() method outputs the properties it detects in the sample text found in txtInput.Text and places them in Bag.
At this point, you want to extract the data from Bag. The GetResultRanges() method outputs an array of MappingDataRange objects. When detecting a language, the output array contains only one element, which contains the list of detected languages.
To get these languages' output of the MappingDataRange array element, the code calls FormatData() with a new StringArrayFormatter object. The result is an array that you can then process using a foreach loop.
The detection is perfect in some cases. For example, if you type Hello World,
you get en (English) as the only output. The language detector can't
tell you what kind of English (such as en-US for American English),
simply that it's English. On the other hand, if you type Hola a todos, the detector can't make a precise determination, as shown in Figure 1.
In this case, you'd still need to select between Spanish and
Portuguese. (The detector also provides both Spanish and Portuguese as
output for Olá Mundo.) However, at least your list of choices is less than if you had to figure out the correct language from scratch.
It's interesting to see how accurate the language detection is for ELS. For example, try typing Hallo Welt to see what language you get. ELS is still a little limited on what it can translate. It did handle this Chinese with aplomb: , but failed miserably with this Hindi: .
The point is that this tool is relatively easy to use, and it does a
better job than the tools that most developers had in the past.